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Abstract 

We present a constraint-based case frame 
lexicon architecture for bi-directional 
mapping between a syntactic case frame 
and a semantic frame. The lexicon uses 
a semantic sense as the basic unit and 
employs a multi-tiered constraint struc- 
ture for the resolution of syntactic in- 
formation into the appropriate senses 
and/or idiomatic usage. Valency chang- 
ing transformations such as morpholog- 
ically marked passivized or causativized 
forms are handled via lexical rules that 
manipulate case frames templates. The 
system has been implemented in a typed- 
feature system and applied to Turkish. 

1 Introduction 

Recent advances in theoretical and practical as- 
pects of feature and constraint-based formalisms 
for representing linguistic information have fos- 
tered research on the use of such formalisms in 
the design and implementation of computational 
lexicons (Briscoe et al., 1993). Case frame ap- 
proach has been the representation of choice es- 
pecially for languages with free constituent order, 
explicit case marking of noun phrases and embed- 
ded clauses filling nominal syntactic roles. The 
semantics of such syntactic role fillers are usually 
determined by their lexical, semantic and mor- 
phosyntactic properties, instead of position in the 
sentence. In this paper, we present an approach 
to building a constraint-based case frame lexicon 
for use in natural language processing in Turkish. 

A number of observations that we have made on 
Turkish have indicated that we have to go beyond 
the traditional transitive and intransitive distinc- 
tion, and utilize a framework where verb valence 
is considered as the obligatory co-existence of an 
arbitrary subset of possible arguments along with 
the obligatory exclusion of certain others, relative 
to a verb sense. Additional morphosyntactic, lex- 
ical and semantic selectional constraints are uti- 
lized to map a given syntactic argument structure 



to a specific verb sense. In recent years, there have 
been several studies on constraint-based lexicons. 
Russell et al. (1993) propose an approach to mul- 
tiple default inheritance for unification-based lexi- 
con. In another study by Lascarides et al. (1995), 
an ordered approach to default unification is sug- 
gested, de Paiva (1993) formalizes the system 
of well-formed typed feature structures. In this 
study, type hierarchies and relations are mathe- 
matically defined. They also formalize unification 
and generalization operators between the feature 
structures, along with defining well-formedness 
notion that we use in our system. 

2 Representing Case Frame 
Information 

In Turkish, (and possibly in many other lan- 
guages) verbs often convey several meanings 
(some totally unrelated) when they are used with 
subjects, objects, oblique objects, adverbial ad- 
juncts, with certain lexical, morphological, and 
semantic features, and co-occurrence restrictions. 
In addition to the usual sense variations due to se- 
lectional restrictions on verbal arguments, in most 
cases, the meaning conveyed by a case frame is id- 
iomatic, with subtle constraints. For example, the 
Turkish verb ye (eat), when used with a direct ob- 
ject noun phrase whose head is: 

1. para (money), with no case or possessive 
markings and a human subject, means to ac- 
cept bribe, 

2. para (money), with a non-human subject, 
means to cost a lot, 

3. para (or any other NP whose head is onto- 
logically IS-A money, e.g., dolar, mark, etc.) 
with obligatory accusative marking and op- 
tional possessive marking, means to spend 
money, 

4. kafa (head) with obligatory accusative mark- 
ing and no possessive marking, means to get 
mentally deranged, 

5. hak (right) with optional accusative and pos- 
sessive markings, means to be unfair. 



6. ba§ (head, cf. 4) (or any NP whose head 
is ontologically IS-A human) with optional 
accusative and optional possessive marking 
(obligatory only with ba§), means to waste 
or demote a person. 

On the other hand: 

1. if an ablative case-marked oblique object de- 
noting an edible entity is present, then there 
should not be any direct object, and the verb 
means to eat a piece of (the edible (oblique) 
object), or 

2. if the ablative case-marked oblique object 
does not denote something edible, but rather 
a container, then the sense maps to to eat out 
of, with the optional direct (edible) object de- 
noting the object eaten. 

Clearly such usage has impact on thematic role as- 
signments to various role fillers, and even on the 
syntactic behavior of the verb in question (Briscoe 
and Carroll, 1994). For instance, for the third 
and fourth cases above where the object has to 
be obligatorily case-marked accusative, a passive 
form would not be grammatical for the sense con- 
veyed, although syntactically ye (eat) is a transi- 
tive verb. 

Sometimes verbs require different combinations 
of arguments, or explicitly require that certain ar- 
guments not be present. For instance, the verb §a§ 
requires different kinds of arguments depending on 
the sense, obligatorily excluding other arguments: 

1. an ablative case-marked oblique object and 
with no other object in the case frame §a§ 
means to deviate from, 

2. a dative case-marked oblique object and with 
no other object, §a§ means to be surprised at, 

3. an accusative case-marked direct object with 
no other object, §a§ means to be confused 
about. 

As a final example, when the verb tut 
(catch/hold) is used with an obligatory 3'"'' per- 
son singular agreement and active voice, and the 
subject IS a (nommalized) S with a verb form of fu- 
ture participle, then the sense conveyed by the top 
level case frame is to feel like doing the predication 
indicated by the subject S's case frame, with the 
agent being the subject of this embedded clause. 

As illustrated in these examples, verb sense id- 
iomatic usage resolution has to be dealt with in a 
principled way and not by pattern matching (e.g., 
as in Tschichold (1995)), when the language has 
a free word order, where pattern matching ap- 
proaches could fail. In this paper, we present a 
unification-based approach to a constraint-based 
case frame lexicon, in which one single mechanism 
deals with both problems uniformly. The essential 
function of our lexicon is to map bidirectionally 
between a case frame containing information that 
is syntactic, and a semantic frame which captures 



the predication denoted by the case frame along 
with information about who fills what thematic 
role in that predication. 

3 The Lexicon Architecture 

In this section we present an overview of struc- 
ture of lexicon entries and the nature of the con- 
straints. The basic unit in the lexicon is a sense 
which is the information denoting some indivisible 
predication along with the thematic roles involved. 
We generate the case frame of each sense by uni- 
fying a set of co-occurrence, morphological, syn- 
tactic, semantic, and lexical constraints on verbs, 
their arguments. The lexicon is implemented in 
TFS (Kuhn, 1993) by the disjunction of the senses 
defined by unifying wf-case-f rame (well-formed 
case frame) with each sense: 

¥f -case-frame < case-frame. 
¥f -case-frame & SENSE#1. 
¥f -case-frame & SENSE#2 . 

¥f -case-frame & SENSE#n. 

3.1 Lexicon Entries 

Each verb sense entry in the lexicon has the struc- 
ture shown by the feature structure matrix in Fig- 
ure 1. 



CAT: V 

STEM: verbal-root 

'SUBJ: ]" 

DIR-OBJ: [U [...] 

DAT-OBJ: [...] 

ABL-OBJ: [..1 



AGENT: [3" 
THEME: [H 



Figure 1: Structure of a case frame lexicon entry. 

The feature structure for each syntactic argu- 
ment contains information about the morpholog- 
ical and syntactic structure of the syntactic con- 
stituent such as part-of-speech, agreement, case, 
possessive markers, and additional morphological 
markings such as verb form, (e.g., infinitive, par- 
ticiple, etc.), voice (e.g., active, passive, causative, 
refiexive, reciprocal, etc.) for embedded S's, along 
with their own case frames. This structure is sim- 
ilar to the structure proposed in Lascarides et al. 
(1995). However, instead of classifying argument 
structures as simply transitive, intransitive, etc., 
we need to consider all relevant elements of the 
power set of possible arguments. For Turkish, the 
syntactic constituents that we have chosen to in- 




elude in the argument slot (for a verb in active 
voice) are the following: 

• subject (nominative HP),^ 

• direct object (nominative or accusative case- 
marked IP), 

• oblique objects (ablative, dative, locative 
case-marked IP), 

• beneficiary object (dative case-marked IP, or 
PP with a certain PFORM), 

• instrument object (instrumental case-marked 
IP or PP with a certain PFORM), 

• value object (dative case-marked IP or PP 
with a certain PFORM). 

In general, there may be more than one instan- 
tiation of the SEM frame for a given instantiated 
set of case frame arguments (and vice versa). For 
instance, for the ye verb discussed above, the ar- 
gument structure for the third case giving rise to 
the meaning to get mentally deranged may con- 
ceivably give rise to a literal meaning in a rather 
improbable context (such as eating the head of a 
fish at dinner - much in the spirit of the two inter- 
pretations of the English idiom kick the bucket), 
or the same semantics may be expressed by a dif- 
ferent surface form. 

3.2 Constraint Architecture 

We express constraints on the arguments in the 
case frame of a verb via a 5-tier constraint hierar- 
chy sharing constraints among the specification of 
other constraints and sense definitions, whenever 
possible: 

^NP's that have no case-marking in Turkish. 



1. Constraints on verb features that describe 
any relevant constraints on the morphologi- 
cal features of the verb, such as agreement or 
voice markers. 

2. Constraints on morphological features that 
describe any obligatory constraints on the ar- 
guments, such as case-marking, verb form (in 
the case of embedded clauses), etc. 

3. Constraints on argument co-occurrence that 
express obligatory argument co-occurrence 
constraints along with constraints that indi- 
cate when certain arguments should not occur 
in order resolve a sense. 

4. Lexical constraints that indicate any specific 
constraints on the heads of the arguments in 
order to convey a certain sense, and usually 
constrain the stem of the head noun to be a 
certain lexical form, or one of a small set of 
lexical forms. 

5. Semantic Constraints that indicate seman- 
tic selectional restriction constraints that 
may resolved using a companion ontologi- 
cal database (again implemented in TFS) in 
which we model the world by defining se- 
mantic categories, such as human, thing, non- 
living object, living object, etc., along the lines 
described by Nagao et al. (1985). 

Figure 2 illustrates the simplified form of the 
constraint-sense mapping of the verb ye (eat). 

3.3 Valency Changing Transformations 

As we have already stated, we encode senses of 
verbs in active voice unless a verb has an idiomatic 
usage with obligatory passive, causative and/or 



reflexive voices.^ In order to handle these valency 
changing transformations, we deflne lexical rules 
as shown in Figure 3. 



Reflexivizatioi 
Lexical 
Rule 



CASE FRAME 



Causativization IN: 

Lexical 

Rule 



Passivization IN: 
Rule 



Figure 3: Valency transformations using lexical 
rules. 

This flgure describes how a given case frame 
with its syntactic constituents is processed by a 
sequence of lexical rules each stripping off a cer- 
tain voice marker and then attempting uniflcation 
with the lexicon for any possible sense resolution. 
The order of lexical rules in this flgure reflects the 
reverse order of voice markers in Turkish verbal 
morphology.^ So a given case frame may have 
to go through three lexical rules until it flnds a 
unifying entry in the lexicon. Uniflcations be- 
fore going through all lexical rules are for (possi- 
bly idiomatic) senses which explicitly require var- 
ious voice markings. Two additional constituents 
are added via these lexical rules. The AGI-OBJ 
(agentive object), denotes the equivalent of the 
by-objeci in passive sentences. The subject of the 
sentences a causative voice marked verb is indi- 
cated by CAUSER in the semantics frame. Our cur- 
rent implementation does not deal with multiple 
causative voice markings (which Turkish allows), 
or with the rather tricky surface case change of 
the object of causation depending on the transitiv- 
ity of the causativized verb. In the examples and 
sample rules below, a voice marker can take one 
of three values: (i) +: indicates the voice marker 
has to be taken, (ii) - : indicates the voice marker 
is not taken (iii) nil : indicates the voice marker 
must not be taken; this is used only in the sense 
deflnitions in the lexicon and can unify with - but 
not with +. 



^For instance: 
birine 

someone-|-DAT 
to hit 

someone-|-DAT 
birine 

to fall in love with 



vurmak 

hit-FlNF 

someone 

hit-F PASS -HNF 

vurulmak 

someone 



STEM: 


m" 
111 




CAUS: 


m 




PASS: 






RFLX* 


l3l 




"SUBJ: 




m -1 

111 


DIR-OE 


J: 


m 


AGN-O 


BJ: 


nil 








FRED: 






ROLES: 






_ 

STEM: 


m" 




CAUS: 






PASS: 


+ 




RFLX* 


l3l 
LiJ 




"sUBJ: 




m" 


DIR-OE 


J: 


nil 


AGN-O 


BJ: 


s 


_ABL-OI 




J 


FRED: 


M 




ROLES: 







Figure 4: The simplifled passivization rule for 
transitive verbs 



Figures 4 and 5 show two of the simpler lexical 
rules. 

3.4 Examples 

In this section we present a few examples that 
show how one can describe a given verb sense. 
For the flrst example the following constraints are 
employed: 

1. VERB-IS-YE is a constraint corresponding to 

[vERB: I STEM: "ye"] 

2. VERB-TAKES-IO-PASSIVE-IO-REFLEXIVE is 

[PASSIVE: nill 



the verb constraint 



[f 



■^We have not dealt with the reciprocal/collective 
voice marker yet. 



3. DIR-OBJ-HAS-IO-POSS is the morphological 

constraint \ ARGS: I DIR-OB J: | POSS: none 

4. DIR-OBJ-IS-ACC is the morphological con- 
straint [aRGS: I DIR-OB J: | CASE: acc] 

5. lO-DATIVE-OBL-OBJ is the argument co- 
occurrence constraint [args: |dat-obl: nii] 

6. SUB JECT-IS-HUMAI is the semantic constraint 

[aRGS: I SUBJECT: |HEAD: | SEM: human] 

7. DIR-OBJ-HEAD-LEX-KAFA is a lexical con- 
straint [aRGS: IDIR-OBJ: |HEAD: | LEX: "kafa"] 

8. SEM-GET-MEITALLY-DERAIGED is the feature 
structure for the semantics portion 

SUBJ: [H] 

FRED: "get mentally deranged" 
ROLES: [eXPER: [H] 

We can then express the constraint for the verb 
sense by unifying (denoted by & in TFS) all the 



STEM: 
CAUS: 
PASS: 
.RFLX: 

SUBJ: 

DIR-OBJ: 

ABL-OBJ: 

FRED: 

ROLES 

"STEM: 

CAUS: 
PASS: 
RFLX: 



nil 



[CAUSER: nil 
[tHEME: 

+ 



SUBJ: 



ROLES: 



m 

DIR-OBJ: [H 
ABL-OBJ: ... 

FRED: S 

TcAUSER: [I] 
[tHEME: 



Figure 5: The simplified causation rule for intran- 
sitive verbs 



constraints above: 



SENSE-GET-MENTALLY-DERANGED := 
VERB-IS-YE & 

VERB-TAKES-NO-PASSIVE-NO-REFLEXIVE & 
DIR-OBJ-HAS-NO-POSS & DIR-OB J-IS-ACC & 
NO-DATIVE-OBL-OBJ & DIR-OB J-LEX-KAFA & 
SUBJECT-IS-HUMAN & 
SEM-GET-MENTALLY-DERANGED . 

The resulting constraint when unified with par- 
tially specified case frame entry - an entry where 
only the argument and verb entries have been 
specified, will supply the unspecified SEM compo- 
nent(s). That is, when a partially specified case 
frame such as 



STEM: 


"ye 


PASS: 


nil 


CAUS: 




RFLX: 


nil 



NP 
CAT: 
STEM: 
CASE: 
AGR: 
POSS: 

NP 
CAT: 
STEM: 
CASE: 
AGR: 
POSS: 



N 

"adam" 
nom 
3SG 
none 

N 

"kafa" 
acc 
3SG 
none 



unifies successfully with the given constraint 
above, the unspecified portion will be properly in- 
stantiated with the experiencer being coindexed 
with the subject in the arguments. 



As a second example, consider the default sense 
of ye corresponding to eat (something). The con- 
straints are: 

1. VERB-IS-YE is the verb constraint 

[vERB: I STEM: "ye"] 

2. VERB-TAKES-IO-REFLEXIVE is the verb con- 
straint [ VERB: I RFLX: nil 

3. lO-DAT-OBL-OBJ is the co-occurrence con- 
straint [ ARGS: I DAT-OBL: nil 

4. DIR-OBJ-IS(optional-edible) is the dis- 
junctive argument constraint 



ARCS: I DIR-OBJ: 



HEAD: SEM: edible 



(This is just explanatory, see below for how 
this is implemented in TFS.) 

5. ABL-OBJ-IS (optional-container) is the 

argument constraint 



ARCS: I ABL-OBJ: 



HEAD: SEM: container 



6. IIST-OBJ-IS (optional-instrument) is the 

argument constraint 



ARCS: I INST: 



instrument 



7. SEM-EATl is the feature structure for the se- 
mantics portion 

m 
m 

E 

"to eat" 



SUBJ: 
DIR-OBJ: 
ABL-OBJ: 
INST: 
■pRED: 



AGENT: 
THEME: 
SOURCE: 
INST: 



In most cases, there are arguments that are not 
obligatorily required for resolving a verb sense. 
These, nevertheless, have to be constrained, usu- 
ally on semantic grounds. For instance the di- 
rect object is not obligatory for the basic sense of 
ye, but has to be an edible entity if it is present. 
We handle these constraints by defining a slightly 
more complex type hierarchy: 

argument = noun-phrase I 

case-frame I 

optional . 
optional = optional-edible I 

optional-container I 

optional-instrument. ... 
optional-edible = nil I edible-obj . 
edible-obj & noun-phrase & IS-A-EDIBLE. 

where IS-A-EDIBLE is a constraint of the form 
[head: I SEM: edible]. The optloual ablatlve and in- 
strumental objects are defined similarly.'* The 



Note that the surface case constraints for these 
are defined in the basic definition of the case frame. 



sense definition tlien becomes: 

SENSE-EATl := 

VERB-IS-YE & VERB-TAKES-NO-REFLEXIVE & 
NO-DATIVE-OBL-OBJ & 
DIR-OBJ-IS(optional-edible) & 
ABL-OBL-OB J (optional-container ) & 
INST-OBJ-IS (optional-instrument) & SEM-EATl. 

As a more complicated example employing nested 
clauses, we present below the case frame for the 
last example in Section 2, where the verb iui 
(catch) is used with a clausal subject for a very 
specific idiomatic usage. 



CAT: 

STEM: 

AGR: 

PASS: 

CAUS: 

RFLX: 



V 

"tut" 

3SG 

nil 

nil 

nil 



SEM: 



CAT: 
VFORM 

SUBJ: 




future-participle 
[H [cAT: NP 



feel like doing' 
AGENT: [H 
THEME: 



In this case, the sense resolution of the embedded 
case frame is also performed concurrently with the 
case frame resolution of the top-level frame. 

The last example below illustrates the handling 
of valency changing transformations where lexical 
rules handle argument shuffling. 
C^ocuk adam tarafmdan 

Child man by 

karsiya gecirildi. 

opposite_side pass+CkMS 

+DAT +PASS+PAST+3SG 

(The child was passed to the opposite side 

by the man.) 
The output for this sentence is presented on the 
right. 

4 Conclusions 

This paper has presented a constraint-based lex- 
icon architecture for representing and resolving 
verb senses and idiomatic usage in a case frame 
framework using constraints on different dimen- 
sions of the information available. Economy of 
representation is achieved via sharing of con- 
straints across many verb sense definitions. The 
system has been implemented using the TFS sys- 
tem. 
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"CAT: 


V 


STEM: 


" geg 


CAUS: 


+ 


PASS: 


-I- 


_RFLX: 





DAT-OBJ: 



NP 
CAT: 
STEM: 
CASE: 
AGR: 
POSS: 

NP 
CAT: 
STEM: 
CASE: 
AGR: 
POSS: 

NP 
CAT: 
STEM: 
CASE: 
AGR: 
POSS: 



N 

" gocuk" 
nom 
3sg 
none 

N 

" kar§i" 
dat 
3sg 
none 

N 

" adam" 
nom 
3sg 
none 



to pass" 
AGENT: 
GOAL: 
CAUSER: 



m 
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